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ABSTRACT 

Multi-sample  cluster  analysis*  the  problem  of  grouping  samples.  Is 
studied  from  an  Information-theoretic  viewpoint  via  Akalke's  Information 
Criterion  (AIC).  This  criterion  combines  the  maximum  value  of  the  likelihood 
with  the  number  of  parameters  used  In  achieving  that  value.  The  multi-sample 
cluster  problem  Is  defined,  and  AIC  Is  developed  for  this  problem. 

The  form  of  AIC  Is  derived  In  both  univariate  and  multivariate  analysis 
of  variance  models.  Numerical  examples  are  presented  and  results  are  shown  to 
demonstrate  the  utility  of  AIC  In  Identifying  the  best  clustering  alternatives 

Key  Words  and  Phrases:  Multi-sample  cluster  analysis;  Akalke's  Information 
Criterion  (AIC);  ANOVA  Model,  MANOVA  Model;  maximum  likelihood. 
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1.  Introduction 

In  this  paper,  we  shall  develop  Akalke's  Information  Criterion  (AIC)  for 
multi-sample  cluster  analysis.  The  problem  of  multi-sample  cluster  analysis 
arises  when  we  are  given  a  collection  of  samples  (groups,  treatments),  to  be 
clustered  Into  homogeneous  groups. 

It  Is  reasonable  to  provide  a  practically  useful  statistical  procedure 
that  would  use  some  sort  of  statistical  model  to  aid  In  comparisons  of  various 
collections  of  samples.  Identify  homogeneous  groups  of  samples,  and  tell  us 
which  should  be  clustered  together  and  which  samples  should  not. 

Examples  of  multi-sample  clustering  situations  are  abundant.  Here  we 
mention  a  few. 

Example  1.1.  Botany:  grouping  of  three  types  of  species  of  Iris,  namely  Iris 
setosa  (S),  Iris  versicolor  (Ve),  and  Iris  vlrglnlca  (VI),  given  In  Example  6.2 
and  Table  6.3  In  Section  6,  on  the  basis  of  each  and  of  all  the  four  variables. 
Example  1.2.  Zoology:  grouping  of  geographical  locations  to  study  the  differ¬ 
ences  of  populations  of  two  types  of  species  of  Crocldura.  Delany  and  Healy 
[7]  studied  variation  In  white-toothed  shrews,  that  Is,  nocturnal  mammals.  In 
the  British  Isles.  White-toothed  shrews  of  genus  Crocldura  occur  In  the 
Channel  and  Scllly  Islands  of  the  British  Isles  and  the  French  mainland.  From 
p  ■  10  measurements  on  each  of  n  *  399  skulls  obtained  from  the  K  ■  10  loca¬ 
tions,  Tresco,  Bryher,  St.  Agnes,  St.  Martin's,  St.  Mary's,  Sark,  Jersey, 
Alderney,  Guernsey,  and  Cap  Gris  Nez.  The  sample  sizes  for  the  data  from  the 
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ten  locations  are,  respectively,  n;  «  144,  n2  *  16,  n3  «  12,  n4  «  7,  ng  ■  90, 
n,  *  25,  n_  ■  6,  n  ■  26,  nn  »  53,  n.n  »  20,  Attempts  were  made  to  analyze  tha 
pattern  of  variation  between  these  ten  populations  to  examine  the  belief  that 
there  may  be  two  species  of  Crocldura,  namely  Crocldura  russula,  C rocl dura 
suaveolens.  The  locations  were  geographically  close,  but  It  Is  assumed  that 
only  one  sub-species  was  present  In  any  one  place.  Thus  the  problem  here  Is  to 
cluster  the  locations,  that  Is,  "samples"  Into  homogeneous  groups  to  discover 
the  origin  of  the  two  species. 

Example  1.3.  Air  and  Water  Pollution:  grouping  of  weather  class  types  or 
nitrate  sites  to  distinguish  whether  the  source  of  nitrate  Is  weather  type  or 
local.  Heldorn  [12]  studied  synoptic,  that  Is,  general  weather  patterns 
associated  with  nitrates  In  southern  Ontario.  In  recent  years,  there  has  been 
growing  concern  over  the  potential  hazard  of  particulate  nitrate  In  the  atmos¬ 
phere  which  acts  as  a  respiratory  Irritant,  especially  to  those  who  have  asthma 
problems.  Nitrate  Is  also  suspected  to  lower  the  pH  level  In  freshwater  lakes. 

A  sample  of  n  »  17  cities  across  southern  Ontario  from  Windsor  In  the  west 
to  Kingston  In  the  east  was  chosen  as  the  location  of  nitrate  sites.  Nitrate 
concentrations  for  the  17  sites  were  measured.  In  order  to  determine  the 
effect  of  weather  patterns  on  the  measurement  of  nitrate,  eight  weather  class 
types  were  defined  for  the  nitrate  sites.  Thus  the  problem  here  Is  to  cluster 
the  weather  class  types  or  the  sites  Into  homogeneous  groups  to  determine 
whether  the  source  of  particulate  nitrate  Is  due  to  weather  class  type  or  Is 
local . 

Example  1.4.  Business  and  Economics:  grouping  of  corporations  by  their 
financial  characteristics.  Chen  et  al.  [6],  Williams  and  Goodman  [16],  and 
others,  studied  the  statistical  methods  for  clustering  corporations  on  the 


basis  of  yearly  data  concerning  several  of  their  financial  characteristics. 

Thus  the  general  problem  here  Is  to  cluster  the  sets  of  corporations  In  order 
to  detect,  describe  and  distinguish  relatively  homogeneous  groups  of  companies 
so  that  the  formation  of  the  groups  and  organizational  behavior  of  companies 
can  be  studied  and  compared. 

So,  as  we  see,  multi-sample  cluster  analysis  examples  are  quite  rich  and 
varied. 

The  analysis  of  variance  (ANOVA)  Is  a  widely  used  model  for  comparing  two 
or  more  univariate  samples,  where  the  familiar  Student's  t  and  F  statistics  are 
used  for  formal  comparisons  among  two  or  more  samples.  Multivariate  analysis 
of  variance  (MANOVA)  Is  a  widely  used  model  for  comparing  two  or  more  multi¬ 
variate  samples.  In  the  MANOVA  model,  the  likelihood  ratio  principle  leads  to 
Wilks'  [17]  lambda,  or  In  short  Wilks*  A  criterion  as  the  test  statistic.  It 
plays  the  same  role  In  multivariate  analysis  that  the  F-ratlo  statistic  plays 
In  the  univariate  case. 

Often,  however,  the  formal  analyses  Involved  In  MANOVA  are  not  revealing 
or  Informative.  Therefore,  In  this  paper  we  shall  propose  Akalke's  Informa¬ 
tion  Criterion  (AIC)  as  a  new  procedure  for  comparing  the  clusters,  and  use  It 
to  Identify  the  best  clustering  alternatives. 

In  1971,  Akalke  first  Introduced  an  Information  criterion,  referred  to  as 
an  automatic  (model)  Identification  criterion  or  Akalke's  Information  criterion 
(AIC),  for  the  Identification  and  comparison  of  statistical  models  In  a  class 
of  competing  models  with  different  numbers  of  parameters.  It  Is  defined  by 

(1.1)  AIC  -  -2  1oge  (maximized  likelihood) 

+2  (number  of  Independently  adjusted  parameters  within  the  model). 
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It  was  obtained  with  the  aid  of  an  Information  theoretic  Interpretation  of  the 
method  of  maximum  likelihood  by  Akalke  ([2],  [3]).  It  estimates  minus  twice 
the  expected  log  likelihood  of  the  model  whose  parameters  are  determined  by  the 
method  of  maximum  likelihood.  When  several  competing  models  are  being  compared 
or  fitted,  AIC  Is  a  simple  procedure  which  measures  the  badness  of  fit  or  the 
discrepancy  of  the  estimated  model  from  the  true  model  when  a  set  of  data  Is 
given. 

The  first  term  In  (1.1)  stands  for  the  penalty  of  badness  of  fit  or 
downward  bias  when  the  maximum  likelihood  estimators  of  the  parameters  of  the 
model  are  used.  The  second  term  In  the  definition  of  AIC,  on  the  other  hand, 
stands  for  the  penalty  of  Increased  unreliability  or  conpensatlon  for  the  bias 
In  the  first  term  as  a  consequence  of  Increasing  number  of  parameters.  If  more 
parameters  are  used  to  describe  the  data,  It  Is  natural  to  get  a  larger 
likelihood,  possibly  without  Improving  the  true  goodness  of  fit  by  penalizing 
the  use  of  additional  parameters. 

Thus,  when  there  are  several  competing  models,  the  parameters  within  the 
models  are  estimated  by  the  method  of  maximum  likelihood  and  the  AlC-values  are 
computed  and  compared  to  find  a  model  with  the  minimum  value  of  AIC.  This 
procedure  Is  called  the  minimum  AIC  procedure.  The  model  with  the  minimum  AIC 
Is  called  the  minimum  AIC  estimate  (MAICE)  and  Is  designated  as  the  best  model . 

In  Section  2,  we  shall  define  the  general  multi-sample  cluster  problem, 

and  In  Section  3,  we  shall  briefly  discuss  the  number  of  clustering 

alternatives  for  a  given  K  groups  or  samples  Into  k  nonempty  clusters.  In  the 

* 

subsequent  sections,  that  Is,  In  Section  4  and  In  5,  we  shall  derive  the  AIC 
procedure  for  the  univariate  analysis  of  variance  (ANOVA)  model,  and  the  multi¬ 
variate  analysis  of  variance  (MANOVA)  model.  In  Section  6,  we  shall  give 
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numerlcal  examples  for  both  univariate  and  multivariate  multi-sample  cluster 
analysis  on  real  data  sets  to  demonstrate  our  results  of  AIC  and  minimum  AIC 
procedures  obtained  from  different  computer  analyses. 

2.  The  Multi-Sample  Cluster  Problem 

Suppose  each  Individual,  object,  or  case,  has  been  measured  on  p  response 
or  outcome  measures  (dependent  variables)  simultaneously  In  K  Independent 
groups  or  samples  (factor  levels).  Let 


(2.1) 


X  (n  x  p) 


Xi 

X2 


be  a  single  data  matrix  of  K  groups  or  samples,  where  Xg  (ngxp)  represents  the 

observations  from  the  g-th  group  or  sample,  g«l,2,...,K,  and  n  £  ng#  The 

9-1 

goal  of  cluster  analysis  Is  to  put  the  K  groups  or  samples  Into  k  homogeneous 

groups,  samples,  or  classes  where  k  Is  unknown,  but  k<K. 

* 

Often  Individuals  or  objects  have  been  sampled  from  K>1  populations.  For 
multi-samples  or  multiple  groups  of  Individuals  or  objects  the  data  matrix  may 
be  represented  In  partitioned  form  as  above.  Let  ng  represent  the  number  of 
Individuals  In  the  g-th  (random)  sample,  g-1,2 . K.  The  ng  are  not  restrict¬ 

ed  to  being  equal  or  proportional  to  other  ng's.  The  total  number  of  observa- 
K 

tlons  Is  n  ■  l  ng.  Let  Xg1  be  the  pxl  vector  of  observations  In  group 
g-1  H 

g*l,2,...,K,  and  for  Individual  1*1,2 . ng. 


I.  The  Number  of  Clustering  Alternatives  for  a  Given  K 
Samples  into  ic  Nonempty  Clusters 

In  this  section,  we  shall  briefly  discuss  how  to  obtain  the  total  number 
of  clustering  alternatives  for  a  given  K,  the  number  of  groups  or  samples.  For 
this,  we  shall  recall  some  established  results. 

Theorem  3.1.  The  number  of  ways  of  clustering  K  groups  or  samples  Into  k 
clusters  such  that  none  of  the  k  clusters  Is  empty  Is  given  by 

(3.1)  !  (M-i)5  (Ml)*  . 

g*0  KQJ 

where  the  order  of  groups  or  samples  within  each  cluster  Is  Irrelevant. 

Proof.  Duran  and  Odell  [9]. 

In  this  theorem  the  k  clusters  are  assumed  to  be  distinct.  However,  In 
clustering  K  groups  or  samples  Into  k  clusters,  none  of  which  Is  empty,  the 
order  of  the  k  clusters  Is  Irrelevant.  Consequently,  from  this  fact  and 
Theorem  3.1,  It  follows  that  the  total  number  of  ways  of  clustering  K  groups  or 
samples  Into  k  clusters  Is  given  by 

(3.2)  S(K,k)  -L.  I(k)  (-1)9  (ic-g)K 

Kl  g»0  y 

which  Is  known  as  the  Stirling  Number  of  the  Second  Kind  (see,  e.g.,  Abramowltz 
and  Stegun  [1])  and  also  called  the  number  of  clustering  alternatives. 

If  k,  the  number  of  clusters  of  groups  or  samples  Is  known  In  advance, 
then  the  total  number  of  clustering  alternatives  Is  given  by  S(K,k).  However, 


If  k  Is  not  specified  a  priori  and  varies,  then  the  total  number  of  clustering 
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alternatlves  for  a  given  K,  the  number  of  groups  or  samples.  Is  given  by 


K 

(3.3)  l  S(K,k)  . 
k-1 


Table  3.1  gives  S(K,k)  for  values  of  K  and  k  up  to  10. 


TABLE  3.1.  NUMBER  OF  CLUSTERING  ALTERNATIVES  FOR  VARIOUS  VALUES  OF  K  AND  k 


k 

K 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

Total 

1 

1 

1 

2 

1 

1 

2 

3 

1 

3 

1 

5 

4 

1 

7 

6 

1 

15 

5 

1 

15 

25 

10 

1 

52 

6 

1 

31 

90 

65 

15 

1 

203 

7 

1 

63 

301 

350 

140 

21 

1 

877 

8 

1 

123 

966 

1701 

1050 

266 

28 

1 

4136 

9 

1 

255 

3021 

7770 

6951 

2645 

462 

36 

1 

21142 

10 

1 

511 

9318 

34101 

42525 

22821 

5879 

750 

45 

1 

115952 

Consider,  for  example,  K-3  samples.  We  now  wish  to  cluster  K-3  groups  or 
samples  first  Into  k-3  groups  or  samples,  then  Into  k»2  groups  or  samples,  and 
k-1  group  or  sample  In  a  hierarchical  fashion. 

From  Table  3.1,  we  have  the  total  number  of  ways  of  clustering  K-3  groups 
or  samples  Into  k-3  homogeneous  groups  or  samples  Is  1.  The  total  number  of 
ways  of  clustering  K-3  groups  or  samples  Into  k*2  homogeneous  groups  or  samples 
Is  3.  The  total  number  of  ways  of  clustering  k-3  groups  or  samples  Into  k-1 
homogeneous  group  or  sample  Is  1.  Thus  adding  up  these  results,  we  obtain.  In 
total  5  clustering  alternatives  as  the  total  for  K-3  groups  or  samples  Into 
k-1, 2,  and  3  homogeneous  groups.  We  note  that  5  Is  nothing  but  the  sum  of  the 


-8- 


values  of  row  3  In  Table  3.1. 

The  5  clustering  alternatives  can  be  classified  according  to  their 
representation  forms  to  make  It  easy  to  list  all  5  possible  clustering 
alternatives.  The  representation  forms  In  this  case  are  denoted  by 
(1)  U>  U>  U), 

(11)  {2}  {1>, 

(111)  {3}, 

where  each  of  the  components  In  a  representation  {g}  denotes  the  number,  g,  of 
groups  or  samples  In  the  corresponding  cluster.  The  components  of  a  represen¬ 
tation  form  will  always  be  written  In  a  hierarchical  order  to  depict  the 
patterns  of  clustering  alternatives.  In  our  example  there  are  5  clustering 
alternatives  but  only  3  representation  forms.  In  general  the  number  of  repre¬ 
sentation  forms  Is  much  smaller  then  the  number  of  clustering  alternatives. 

We  now  list  the  clustering  alternatives  corresponding  to  their  representa¬ 
tion  forms  In  Table  3.2  as  follows: 


TABLE  3.2.  A  SIMPLE  PATTERN  OF  CLUSTERING  ALTERNATIVES 
WHEN  K»3  AND  k=3,  2,  and  1 


Alternatives 

Clustering 

Number  of 
Parameters  m 

1 

(1)  (2)  (3) 

3 

2 

(1  2)  (3) 

2 

3 

1  31  l1* 

2 

4 

(2  3)  (1) 

2 

5 

(1  2  3) 

1 

For  example.  In  alternative  one,  the  group  or  sample  1,  2,  and  3  are 
clustered  as  singletons.  In  terms  of  a  hypothesis  on  means,  this  corresponds 
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to  u1>  u2,  and  u3  all  being  different,  and  therefore,  the  number  of  parameters, 
m.  Is  equal  to  3.  Hence,  Indicating  that  group  1,  2,  and  3  are  all  hetero¬ 
geneous.  In  alternative  two,  groups  or  samples  1  and  2  are  clustered  together 
forming  a  homogeneous  subset,  and  group  or  sample  3  Is  clustered  alone  forming 
a  heterogeneous  subset.  In  terms  of  a  hypothesis  on  means,  this  corresponds  to 
ui*  **2»  an<*  v3  different  from  both  ^  and  v2  with  the  total  number  of 
parameters  m  being  equal  to  2.  In  a  similar  fashion,  we  Interpret  the  other 
clustering  alternatives  continuing  down  the  line  of  Table  3.2. 

As  a  last  example,  we  shall  just  list  the  results  of  the  total  number  of 
possible  clustering  alternatives  when  K»4  groups  or  samples  In  Table  3.3  as 
follows. 


TABLE  3.3.  A  SIMPLE  PATTERN  OF  CLUSTERING  ALTERNATIVES 
WHEN  K-4  AND  k-4,  3,  2,  AND  1 


Alternatives 

Clustering 

Number  of 
Parameters,  m 

1 

(1)  (2)  (2 

1)  (4) 

4 

2 

(1  2)  (3) 

(4) 

3 

3 

(1  3)  (2 

(4) 

3 

4 

(1  4)  (2 

(3) 

3 

5 

(2  3)  (1 

(4) 

3 

6 

(2  4)  (1 

(3) 

3 

7 

(3  4)  (1 

(2) 

3 

8 

(1  2)  (2 

*  4 

2 

9 

1  3)  (2  4) 

2 

10 

1  4)  (2  3) 

2 

11 

(1  2  3)  (4) 

2 

12 

(1  2  4) 

(3) 

2 

13 

1  3  4) 

(2) 

2 

14 

2  3  4) 

(1) 

2 

15 

(1234) 

1 

In  concluding  this  section,  we  see  that  In  general  the  total  number  of 


i 


ways  of  clustering  K  groups  or  samples  Into  k  homogeneous  groups  or  samples  Is 
given  by  equation  (3.2),  and  the  total  number  of  possible  clustering  alterna¬ 
tives  Is  given  by  the  expression  (3.3). 

4.  AIC  For  The  Univariate  Model 


We  now  turn  our  attention  to  consider  situations  with  several  univariate 
normal  samples.  The  general  layout  for  such  data  (one-way  ANOVA)  Is 
represented  In  the  following  tabular  form. 


TABLE  4.1.  GENERAL  DATA  REPRESENTATION  FOR  ONE-WAY  ANOVA 


Observations 


SAMPLE  SIZES 


SAMPLE  MEANS 


Groups 

1 

2  ... 

K 

211 

Z21  *  •  * 

X 

Z12 

• 

• 

Z22  *  *  * 

•  •  •  • 

•  a  a  a 

X 

9 

• 

• 

2in! 

•  a  a  a 

22n2  ‘  *  * 

9 

T.  T,  .  .  .  T 

1  2  K 

n, 

H  a  a  a 

n 

l 

2 

K 

7, 

7  ... 

7 

i  • 

2* 

K* 

"  ■  I  "9 
9*1 


VARIANCES 


2 

;1 


For  example,  we  may  have  multi-sample  data  with  samples  of  sizes 

n1(n2,...,n  which  are  assumed  to  have  come  from  K  populations,  the  first  with 
K 

2  2 

mean  u1  and  variance  a  ,  the  second  with  mean  u2  and  variance  o  the  Kth 

2 

with  mean  u  and  variance  a  .  Vie  may  want  to  compare  these  K  group  or  sample 
K 

means  ultv2 . .  given  that  all  have  a  common  a2.  Hence,  this  Is  the  well 

K 

known  analysis  of  variance  (ANOVA)  model.  In  terms  of  the  parameters  the 

ANOVA  model  Is  e  »  (w1»u2,»..»w  ,o2)  with  m»k+l  parameters,  where  k  Is  the 

1C 

number  of  groups. 

We  shall  derive  the  form  of  AIC  for  this  model.  Recall  the  definition  of 
AIC  from  Section  1, 

A 

AIC  *  -2  loge  L(e)  +  2m 

*  -2  loge  (maximized  likelihood)  +  2m  , 

where  m  denotes  the  number  of  Independently  adjusted  parameters  within  the 
model . 

Suppose  there  are  K  Independent  samples  of  Independent  observations,  with 

K 

nQ,  g«l,2,...,K,  observations  In  the  g-th  group  and  n  -  l  na.  Denote  the 

g-i 

unknown  means  of  the  groups  by  u  ,u2,...,u  •  Assume  that  the  samples 

K 

(z,,,z,„ . z  ;  .  .  .  ;  z . z  )  are  drawn  randomly  from  K  populations 

11  12  ini  K1  KnK 

which  are  N(ug,02).  If  the  groups  can  differ  only  In  their  means,  we  may 

express  this  as 

(4.1)  Zgl  *  ug  +  figl ,  g*l ,2,... ,K;  1*l,2,...,ng, 

where  Zgi  Is  the  value  of  the  response  or  outcome  variable  In  the  g-th 

group  for  the  1-th  Individual  or  object, 
ug  are  parameters, 

egi  are  Independent  N(0,o2)  error  variables. 
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Thls  equation  Is  called  the  one-way  ANOVA  model . 

Thus,  the  basic  null  hypothesis  of  Interest  In  this  case  Is  given  by 

(4.2)  Hq  :  V  u2  »  .  .  .  -  p  . 

The  alternative  hypothesis  Is  given  by 

H1  :  the  K  population  means  are  not  all  equal. 

Every  analysis  of  variance  Involves  a  partitioning  of  the  total  sum  of 
squares  of  deviations,  SST,  Into  the  wlthln-group  sum  of  squares  of 
deviations,  SSW,  and  the  between-group  sum  of  squares  of  deviations,  SSB.  For 
more  details  on  this,  we  refer  the  reader  to  any  basic  text  on  statistics, 
e.g.,  Anderson  and  Sclove  [4]. 

We  now  derive  the  form  of  Akalke's  Information  Criterion  (AIC)  for  the 
one-way  ANOVA  model  given  In  (4.1). 

The  likelihood  function  Is  given  by 

-n/2  K  ng 

(4.3)  L({jig},02;z)  *  (2*02)  exp[-  l  l  (zgi  -  Mg)2/(2a2)]. 

g*l  1*1 

The  log  likelihood  function  Is 

(4.4)  1({ug>»o2;z)  s  log  L((ug>»o2;z) 

■  *  7  1ou(2»)  -  -S-  log(o  )  -  £  £  (zgi  -  ug)/(2c  ). 

2  z  g-1  1-1 

As  Is  well  known,  the  MLE's  are 

V  K  Jx Zs1  ■  9-1,i . *• 


(4.5) 
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and 


I A  *2  1  r  p9  #  Z  \2  SSW 

(4.6)  a  --  l  l  (*g1  -  *g.)  «  —  , 

g*l  1*1 


K  ng 

where  SSW  •  l  l  (zgj  -  7gJ2,  the  Within  Group  Sum  of  Squares. 
g»l  1»1 

Substituting  these  back  Into  (4.4),  we  have 


U{ug},o2;z)  =  log  L((u>,a  ;z) 


-  jClog(2*)  *  log  SS]  -  5 


Since 


(4.7)  AIC  -  -2  loge  L(e)  +  2ra, 

where  m  Is  the  number  of  parameters,  and  since 

(4.8)  -2  log  L({ug)»^2)  «  n  log(2»)  +  n  log  +  n  , 
then  AIC  becomes 


(4.9)  AIC  »  n  log(2w)  +  n  log  ~  +  n  +  2(k+l). 

Since  the  constants  do  not  affect  the  result  of  comparison  of  models,  we 
could  Ignore  them  and  use  the  simplified  version 

(4.10)  AIC*  ■  nloge  SSW  +  2(k+l) 

K 

where  n  -  T  na  -  the  total  sample  size, 

g-i 


SSW  *  Within  Group  Sum  of  Squares,  and 


k  «  number  of  groups  or  samples  compared,  or  the  number  of 
Independently  adjusted  parameters  within  the  model. 

However,  for  purposes  of  comparison  we  retain  the  constants  and  use  AIC. 

5.  AIC  For  the  Multivariate  Model 

In  this  section  we  shall  study  the  natural  extension  of  the  univariate 
model  we  considered  In  Section  4  to  Its  multivariate  analogue.  Therefore, 
throughout  this  section  we  shall  suppose  that  we  may  have  Independent  data 
matrices  »  where  the  rows  of  Xg  (ngxp)  are  Independent  and 

ft 

Identically  distributed  (l.l.d.)  Np(pg,rJ,  g*l,2,...,K.  In  terms  of  the 
parameters  e  *  (u^ ,u2 . .  »£)  the  model  we  shall  consider  here  Is 

"  *  %  •  ft 


®  *  *  *  ***!(*—) 

with  m  -  kp  +  p(p+l)/2  parameters,  where  k  Is  the  number  of  groups,  and  p  Is 
the  number  of  variables. 

As  In  the  univariate  case,  consider  K  normal  populations  with  different 

mean  vectors  ug,  g-1,2 . k,...,K.  Let  zgj,  g»l,2 . K;  1»1,2 . ng,  be  a 

random  sample  of  observations  from  the  g-th  population  Np(Ug,£).  if  the 
groups  or  samples  can  differ  only  In  their  mean  vectors,  we  can  write  the 
multivariate  one-way  analysis  variance  (MANOVA)  model  as 

(5.1)  Zgt  »  jjg  +  egi  ,  g*l,...,K;  1*1,2,. ..,ng  , 

where  zgf  Is  the  (p  x  1)  response  or  outcome  vector  In  the  g-th  group  for 

1-th  Individual  or  object, 
ug  are  vector  parameters,  and 
cgl  are  Independent  N^(0,  z)  random  vector  errors. 
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Thus,  the  basic  null  hypothesis  we  usually  are  Interested  In  testing  Is 
given  by 


The  alternative  hypothesis  Is  given  by 


H 


1 


:  Not  all  u 
-K 


are  equal. 


Wilks'  lambda  Is  a  general  statistic  for  handling  this  problem. 

Although,  there  are  several  other  conventional  statistics  for  this  purpose, 
they  all  can  be  viewed  as  special  cases  of  Wilks'  A  which  we  shall  not  discuss 
here. 

For  notatlonal  purposes,  we  shall  denote  X  t0  b®  the  "total"  sum  of 
squares  and  products  (SSP)  matrix,  W  to  be  the  "wlthln-group"  or  "wlthln- 
sanple"  SSP  matrix,  and  JJ  to  be  the  “between-group”  SSP  matrix.  Hence,  It  can 
be  shown  that 


(5.3)  T  -  W  +  B  , 
where 

K  ng 

(5.4)  I  ■  I  l  (zgi  -  z)(zgl  -  z)', 

g-1  1-1  *  -  * 


(5.5) 


W 


K  ng 

1  1  (zg1  ”  zg)(zg1  ”  zg) 

g-1  i-i  -y  -a  -s  -3 


and 

(5.6)  B 


K 

X  Ag  (Zg  ”  Z")(Zg  "  z") '  t 
g-1  ...  - 
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vrlth 


!g  fgi  »  9-i»2 . *  » 

i  K  ng  K 

5  -  n  l  l  29i  •  "  -  l  ng  • 
"  g»l  1«1  g»l 


Therefore*  we  can  present  multivariate  one-way  analysis  of  variance 
(MANOVA)  table  as  follows. 


TABLE  5.1.  MANOVA  TABLE 


Source 

SSP  matrix 

Wilks'  criterion 

Between  samples 

K-l 

B 

111 

III 

Within  samples 

n-K 

W 

~A(p  ;  n  -  K  ;  K  -  1) 

Total 

n-1 

T 

Now,  we  derive  the  form  of  Akalke's  Information  Criterion  (AIC)  for  the 
MANOVA  model  given  In  (5.1),  subject  to  the  constraint  given  In  (5.2).  The 
likelihood  function  of  all  the  sample  observations  Is  given  by 

K 

(5.7)  L(|ig*£g>Z)  *  B 

g-i 
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or  by 

(5.8)  l  •  (2,)-"P/2  IS  li,fV2x 

9*1 


K  -i 

Ag  -  i/2tr  l  ngZg  (7g  -  ug)(zg  -  ng)'} 

9-1 


K  ng 

where  n  ■  £  ng  and  Ag  «  l  (zg1  -  zg)(zg1  -  zg)'  . 


The  log  likelihood  function  Is 


(5.9)  =  iogeL 


K  K 

«  -  -2^  log(2n)  -  1/2  l  nglogl^gl  -  i/2tr  J  Z^lAg 

g*l  g*l 

K  -l  __ 

-  l/2tr  l  OgTg  (Zg  -  ]i)(Zg  -  Jig)'  . 

9-1 


Since  the  common  covariance  matrix  Is  £,  then  the  log  likelihood  function 
becomes 


(5.10)  l{}jg},£;Z)  =  logeL({ijg),r;2) 


»  -  -2|  1og(2»)  -  £  log  \z\  -  l/ztrt’1 


.1  K 

-  i/2tr£  £  ng(Jg  -  Ug)(zg  ~  yg) '  , 
9*1 
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> 

1 


j 

and  the  maximum-likelihood  estimates  (MLE's)  of  ug,  and  £  are 

(5.11)  ug  -  fg  »  9-1,2 . K, 

and 

(5.12)  l  -  nW, 

K 

where  W  *  l  hg  • 

9-1 

Substituting  these  back  Into  (5.10)  and  simplifying,  the  maximized  log 
likelihood  becomes 

(5.13)  H{ug>.£;£)  =  log  L(^g>«£;£) 

*  *  ^  log(2ir)  -  £  log|n  *WJ  -  Tp*  , 

where  £  Is  the  "wlthln-group"  SSP  matrix. 

Since 

A 

(5.14)  AIC  «  -2  1ogeL(e)  +  2m  , 

where  m  »  kp  +  is  the  nunfcer  of  parameters,  then  AIC  becomes 

(5.15)  AIC  ■  nplog(2ir)  +nlog|n"lH|  +  np  +  2[kp  + 

Since  the  constants  do  not  affect  the  result  of  comparison  of  models,  we 
could  Ignore  them  and  reduce  the  form  of  AIC  to  a  much  simpler  form 

(5.15)  AIC*  ■  nloge|W|  +  2[kp  4  PffiU.] 


h 


* 
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K 

where  n  »  l  ng  ■  the  total  sample  size, 

9*1 

|Wj  ■  the  determinant  of  "wlthln-group"  SSP  matrix, 
k  -  number  of  groups  or  samples  compared, 
p  ■  number  of  variables. 

However,  for  purposes  of  comparison  we  retain  the  constants  and  use  AIC. 

6.  Numerical  Examples  of  Multi-Sample  Cluster  Analysis  on  Real  Data  Sets 
In  this  section  we  shall  give  numerical  examples  of  both  univariate  and 
multivariate  multi-sample  data,  and  cluster  the  groups  or  samples,  and  choose 
the  best  clusterings  by  using  Akalke's  Information  Criterion  (AIC)  as  derived 
In  Sections  4  and  5. 

Our  computations  were  carried  out  for  all  the  examples  we  shall  present 
here  on  an  IBM  370,  using  various  statistical  software  packages  such  as 
MINITAB,  SPSS,  and  SPEAKEASY  (VM/CMS  version). 

6.1.  Univariate  Examples 

For  the  univariate  numerical  examples  we  shall  Illustrate  our  results  on 
two  data  sets,  a  biomedical  data  set  of  Dolkart,  Hal pern,  and  Perlman  [8]  and 
Fisher  [10]  Iris  data.  Here  we  shall  take  150  Iris  specimens  on  each  of  the 
four  morphological  variables:  sepal  length  and  width  and  petal  length  and 
width  and  demonstrate  our  results  on  these  variables  Individually  rather  than 
considering  all  of  them  together. 

Example  6.1.  (Brown  and  Hollander  [5])  Antibody  Responses  In  Three  Groups  of 
Mice:  "Dolkart,  Halpern,  and  Perlman  [8]  compared  antibody  responses  In 
normal  and  alloxan  diabetic  mice.  Their  Investigation  was  designed  to  study 
the  circulating  antibody  response  In  alloxan  diabetic.  Insulin-treated 


diabetic  and  normal  CF-1  mice  Injected  with  serum  albumin 


"Only  those  animals  treated  with  alloxan  who  had  elevated  serum  glucose 
levels  (250mg/100  ml  or  higher)  were  Included  In  the  study,  together  with  a 
group  of  normal  animals.  Animals  were  bled  from  the  orbital  sinus,  and  the 
serum  analyzed  for  antigen  binding  capacity  of  BSA,  glucose  concentration,  and 
serum  proteins.  BSA  was  lodlnated  with  1-131,  and  the  antigen-binding 
capacity  of  each  serum  sample  was  determined  as  micrograms  of  BSA  nitrogen 
bound  by  1  ml  of  undiluted  serum."  The  data  are  given  In  Table  6.1. 

TABLE  6.1  MICROGRAMS  OF  BSA  NITROGEN  BOUND  PER  ml  OF  UNDILUTED 
MOUSE  SERUM  ON  DAY  39,  FOLLOWING  INJECTION  OF  5  mg 
BSA  ANTIGEN  INTO  EACH  ANIMAL  ON  DAY  0  AND  28 


Normal 

Alloxan  Diabetic 

Alloxan  Diabetic- 
Treated  with  Insulin 

155.76 

390.72 

82.50 

282.00 

46.20 

99.66 

197.34 

468.60 

97.66 

297.00 

86.46 

150.48 

115.50 

174.02 

242.88 

126.72 

132.66 

67.98 

119.46 

13.20 

227.70 

29.04 

498.96 

130.68 

252.78 

167.64 

73.26 

122.10 

62.04 

17.82 

349.14 

127.38 

19.80 

108.90 

275.88 

100.32 

143.22 

176.22 

71.94 

64.02 

145.86 

133.32 

25.54 

108.24 

464.64 

85.80 

275.88 

36.96 

122.10 

50.16 

46.20 

454.85 

72.60 

34.32 

655.38 

43.56 

13.86 

Source:  R.E.  Dolkart,  B.  Halpern,  and  J.  Perlman  f8]. 


In  this  example  we  are  given  K*3  groups  or  samples  and  we  wish  to  cluster 
them  Into  k-1,  2,  and  3  homogeneous  groups.  From  Table  3.1,  as  we  know,  there 


are  5  total  possible  clustering  alternatives,  namely,  (1)  (2)  (3)  all  separate, 
and  (1  2)  (3),  (1  3)  (2),  (2  3)  (1),  and  (1  2  3)  all  together.  Let  us  code 
Normal  Group-1,  Alloxan  Diabetic  Group-2,  and  Alloxan  Diabetic-Treated  with 
Insulin  Group-3.  Considering  the  ANOVA  model  as  our  underlying  model  for 
comparisons  of  these  groups,  from  a  simple  ANOVA  run  on  the  computer  we 
computed  the  AIC's  for  each  of  the  5  clustering  alternatives.  The  results  are 
shown  In  Table  6.2. 

TABLE  6.2  THE  AIC'S  FOR  ANTIBODY  RESPONSES  IN  THREE  GROUPS  OF  MICE 
Alternative  Clustering  nloge(2ir)  nloge^/n  n  k  2(k+l)  AIC 

1  (1)  (2)  (3)  104.758  559.139  57  3  8  728.897c 

2  (1  2)  (3)  104.758  559.149  57  2  6  726.9073 

3  (1  3)  (2)  104.758  561.945  57  2  6  729.703 

4  (2  3)  (1)  104.758  561.513  57  2  6  729.271w 

5  (123)  104.758.  562.581  57  1  4  728.339b 

n  -  20  +  18  +  19  «  57 

AIC  -  nloge(2w)  +  nloge  ^M/n  +  n  +  2  (k+1) 
aF1rst  Minimum  AIC 
^Second  Minimum  AIC 
cTh1rd  Minimum  AIC 

In  this  example  the  first  minimum  AIC  occurs  at  the  alternative  submodel 
2.  That  Is,  the  MAICE  Is  submodel  2  Indicating  to  us  that  In  terms  of  cluster¬ 
ing,  Normal  Group-1  and  Alloxan  Diabetic  Group-2  should  be  clustered  together, 
and  Alloxan  Diabetic-Treated  with  Insulin  Group-3  should  be  clustered  by 
Itself.  Therefore,  In  terms  of  a  hypothesis  on  means,  (1  2)  (3)  corresponds  to 
ui"  v2  *  v3  tn<^cat*n9  *hat  Normal  and  Alloxan  Diabetic  Groups  form  the  best 
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horogeneous  set  In  terms  of  their  nitrogen-binding  capacities,  and  the  Alloxan 
Diabetic -Treated  with  Insulin  Group  forms  a  set  by  Itself,  On  the  other  hand, 
the  second  minimum  AIC  occurs  at  the  alternative  submodel  5,  and  the  third 
minimum  AIC  Is  at  the  alternative  submodel  1  Indicating  that  either  we  should 
cluster  all  the  groups  together  or  treat  each  group  separately,  but  If  we  were 
to  compare  each  group  separately  to  the  Normal  Group»l,  then  we  should  choose 
Normal  Group*l  with  Alloxan  Diabetic  Group»2  together  as  the  best  choice  by 
the  minimum  AIC  procedure. 

Example  6,2.  Clustering  of  Irises  by  Groups:  As  we  mentioned  In  Example  1.2, 
the  Iris  data  set  Is  composed  of  150  Iris  species  belonging  to  three  groups  or 
species,  namely  Iris  setosa  (S),  Iris  versicolor  (Ve),  and  Iris  vlrglnlca  (VI) 
measured  on  sepal  and  petal  length  and  width.  Each  group  Is  represented  by  50 
plants.  The  data  set  for  the  150  Irises  are  given  In  Table  6.3. 

This  data  set  has  been  quite  extensively  studied  In  classification  and 
cluster  analysis  since  It  was  published  by  Fisher  [10],  and  still  today.  Is 
being  used  as  a  "testing  ground"  for  classification  and  clustering  methods 
proposed  by  many  Investigators  such  as  Friedman  and  Rubin  [11],  Kendall  [13], 
Solomon  [15],  Mezzlch  and  Solomon  [14],  and  many  others.  Including  the  present 
authors. 

for  each  of  the  150  plants  we  already  know  the  group  structure  of  the 
Iris  species,  namely  K*3  groups  or  samples.  Even  though  the  two  species,  Iris 
setosa  and  Iris  versicolor  were  found  growing  In  the  same  colony,  and  Iris 
vlrglnlca  was  found  growing  In  a  different  colony,  Fisher  reports  In  his 
linear  discriminant  analysis  the  separation  of  I.  setosa  completely  from  I. 
versicolor  and  I.  vlrglnlca.  Since  then  other  Investigators  have  shown 
similar  results  In  their  studies  such  as  the  ones  we  mentioned  above. 
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TABLE  6.3  -  MEASUREMENTS  ON  THREE  TYPES  OF  IRIS 


foU  *&&74a 


foci  vtAjUcaloK 


foci  vjfo&bu.s& 


Sepal  Sapal  Fatal  Petal  Sepal  Sepal  Petal  Petal  Sepal  Sepal  Petal  Petal 
length  width  length  width  length  width  length  width  length  width  length  width 


S.B  2.8  4.8 
8.7  3.0  S.O 
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Wit  h  this  In  mind,  let  us  take  K»3  groups  or  species  on  each  of  the 
variables  separately  and  cluster  them  Into  k»l,  2,  and  3  homogeneous  groups. 
Since  we  are  dealing  with  K»3  groups,  by  now  we  know  that  there  are  5  total 
possible  clustering  alternatives.  Denoting  I.  setosa  by  S,  I.  versicolor  by  Ve, 
and  I.  vlrglnlca  by  VI,  we  have  (S)  (Ve)  (VI),  (S,  Ve)  (VI),  (S,  VI)  (Ve),  (Ve, 
VI)  (S),  and  (S,  Ve,  VI)  as  all  the  possible  clustering  alternatives  of  three 
Iris  species.  Using  the  ANOVA  model  as  our  underlying  model  for  comparisons  of 
these  Iris  groups,  from  a  simple  ANOVA  run  on  the  computer  by  using  SPSS  MANOVA 
program  which  performs  both  univariate  and  multivariate  linear  estimation  and 
tests  of  hypotheses,  we  obtained  the  AIC's  for  each  of  the  5  clustering 
alternatives  of  Iris  groups  on  each  of  the  four  variables  separately.  We  report 
our  results  on  each  of  the  four  variables,  respectively,  as  follows. 


TABLE  6.4.  THE  AIC'S  FOR  IRISES  BY  GROUPS  ON  VARIABLE  SEPAL  LENGTH 


Alternative 

Clustering 

nloge(2ir) 

nlogeS$W/n 

n 

B 

AIC 

1 

(S)  (Ve)  (VI) 

275.681 

-200.295 

m 

3 

8 

233.386* 

2 

(S,  Ve)  (VI) 

275.681 

-135.669 

til 

2 

6 

296.012 

3 

(S,  VI)  (Ve) 

275.681 

-  58.550 

150 

2 

6 

373.131 

4 

(Ve,  VI)  (S) 

275.681 

-163.740 

150 

2 

6 

267. 941® 

5 

(S,  Ve,  VI) 

275.681 

-  56.966 

150 

1 

4 

372.715 

TABLE 

6.5.  THE  AIC'S 

FOR  IRISES  BY  GROUPS  ON  VARIABLE  SEPAL  WIDTH 

Alternative 

Clustering 

nloge(2w) 

nlogeSSW/n 

D 

fl 

2(k+l) 

AIC 

1 

(S)  (Ve)  (VI) 

275.681 

-326.949 

El 

3 

!  8 

106.732* 

2 

(S,  Ve)  (VI) 

275.681 

-252.915 

131 

2 

178.766 

3 

(S,  VI)  (Ve) 

275.681 

-287.157 

150 

2 

144.524 

4 

(Ve,  VI)  (S 

275.681 

-318.019 

150 

2 

113.662® 

5 

(S,  Ve,  VI 

275.681 

-250.129 

150 

1 

4  | 

179.552 

TABLE  6.6.  THE  AIC'S  FOR  IRISES  BY  GROUPS  ON  VARIABLE  PETAL  LENGTH 


Alternative 

Clustering 

nloge(2ir) 

nlogessw/n 

n 

n 

2(k+l) 

AIC 

1 

(S)  (Ve)  (VI) 

275.681 

-255.988 

m 

3 

8 

177.693a 

2 

(S,  Ve)  (VI) 

275.681 

59.442 

BS9 

2 

6 

491.123 

3 

S,  VI)  (Ve 

275.681 

163.259 

150 

2 

6 

594.940 

4 

(Ve,  VI)  (S 

275.681 

-116.579 

150 

2 

6 

315.102b 

5 

(S,  Ve,  VI) 

275.681 

169.493 

150 

1 

4 

599.174 

TABLE  6.7.  THE  AIC'S  FOR  IRISES  BY  GROUPS  ON  VARIABLE  PETAL  WIDTH 


Alternative 

Clustering 

nloge(2ir) 

nlogeSSW/n 

n 

H 

AIC 

1 

(S)  (Ve)  (VI) 
(S,  Ve)  (VI 

275.681 

-478.966 

m 

3 

8 

-45.285a 

2 

275.681 

-216.942 

isa 

2 

6 

214.739 

3 

(S,  VI)  (ve) 

275.681 

-  84.552 

150 

2 

6 

347.129 

4 

(Ve,  VI)  (S 

275.681 

-314.688 

150 

2 

6 

116.993b 

5 

(S,  Ve,  VI) 

275.681 

-  82.452 

150 

1 

4 

347.229 

AIC  ■  nloge(2ir)  +  nloge  SSM/n  +  n  +  2(k+l) 
aF1rst  Minimum  AIC 
^Second  Minimum  AIC 


Looking  at  each  of  the  tables  above,  we  see  that  on  each  of  the  variables 
the  first  minimum  AIC  occurs  at  the  alternative  submodel  1,  namely  (S)  (Ve)  (VI). 
That  Is,  the  MAICE  Is  submodel  1  Indicating  that  Indeed  there  are  three  types  of 
species  across  all  the  variables.  But  the  second  minimum  AIC  Is  at  the  alterna¬ 
tive  submodel  4  again  across  all  the  variables  Indicating  that  If  we  were  to 
cluster  any  Iris  species,  we  should  cluster  I.  versicolor  and  I.  vlrglnlca 
together,  as  one  homogeneous  group. 

Thus  our  minimum  AIC  results  for  each  of  the  variables  confirm  other  Investi¬ 


gators'  findings.  Including  Fisher's  results  on  the  Iris  data.  Moreover,  If  we 
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were  to  choose  among  the  submodels  then  Me  would  choose  the  one  with  smallest 
minimum  AIC  as  the  best  submodel.  Examining  the  Tables  6.4,  6.5,  6.6,  and  6.7, 
we  see  that  the  smallest  minimum  AIC  occurs  at  the  submodel  1  In  Table  6.7  on 
variable  petal  width.  This  Indicates  that  petal  width  alone  separates  the 
three  Iris  species  with  virtual  certainty,  confirming  again  Fisher's  results 


(see,  e.g.,  Fisher  [10]). 


6.2.  A  Multivariate  Example 

Now  we  consider  Fisher  Iris  data  again  and  this  time  we  cluster  K-3  groups 
or  species  Into  k»l,  2,  and  3  homogeneous  groups  on  the  basis  of  all  the  four 
variables,  assuming  the  MANOVA  model  as  the  underlying  model  for  comparisons 
of  these  three  Iris  groups.  On  the  Iris  data,  running  SPSS  MANOVA  program,  we 
obtain  the  following  "wlthln-group"  sum  of  squares  and  products  (SSP)  matrices 
for  each  of  the  clustering  alternatives.  These  are: 


(1)  (S)  (VE)  (VI) 


(2)  (S,  VE)  (VI) 


39.462 

13.818 

24.729 

5.6554 

13.818 

16.962 

8.1208 

4.8084 

w  » 

24.729 

8.1208 

27.223 

6.2718 

5.6554 

4.8084 

6.2718 

6.1566 

“  -1 

— 

150  1oge| 150  Wj 

|  -  -1,504.2 

60.714 

-1.3489 

89.222 

30.549 

-1.3489 

27.786 

-37.906 

-12.958 

&  - 

89.222 

-37.906 

222.94 

81.818 

30.549 

-12.958 

81.818 

35.317 

I 

-l 

150  log* | 150  Wj  -  -1,085.9 


1 


(3)  (S,  VI)  (VE) 


(4)  (VE,  VI)  (S) 


(5)  (S,  VE,  VI) 


102.6 


-4.3257 

186.38 

76.044 

22.115 

-38.301 

15.395 

38.301 

445.43 

188.28 

15.395 

188.28 

85.367 

«  -988. 

39 

17.184 

46.047 

17.205 

18.002 

14.71 

8.3784 

14.71 

68.954 

28.882 

8.3784 

28.882 

18.407 

*  -1,129.6 

-6.0197 

189.78 

76.884 

28.307 

-49.119 

-18.124 

-49.119 

464.33 

193.05 

-18.124 

193.05 

86.57 

150  loge|150  Wj  -  -941.73 

After  carrying  out  all  our  computations  for  each  of  the  clustering 
alternatives  (using  the  Matrix  Algebra  Routines  In  SPEAKEASY  Interactive 
computer  package),  we  obtain  the  AIC's  from  (5.15).  The  results  are  shown  In 
Table  6.S. 
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TABLE  6.8.  THE  AIC'S  FOR  IRISES  BY  GROUPS  ON  ALL  VARIABLES 


Alternative 

Clustering 

nploge(2u) 

.1 

n1oge|n  Wj 

np 

H 

2m 

AIC 

1 

(S)  (Ve)  (VI) 

1,102.724 

-1,504.2 

600 

3 

44 

242.524a 

2 

(S,  Ve)  (VI) 

1,102.724 

-1,085.9 

600 

2 

36 

652.824 

3 

(S,  VI)  (Ve) 

1,102.724 

-  988.39 

600 

2 

36 

750.334 

4 

(Ve,  VI)  (S) 

1,102.724 

-1,299.6 

600 

2 

36 

439.124® 

5 

(S,  Ve,  VI) 

1,102.724 

-  941.73 

600 

1 

28 

788.994 

n  ■  150  plants,  p  -  4  variables 


m  »  kp  +  p(p+l)/2  parameters 

-l 

AIC  *  nploge(2v)  +  nloge|n  l£|  +  np  +  2m 
aF1rst  Minimum  AIC 
^Second  Minimum  AIC 


s 


Hence,  looking  at  the  Table  6.8,  we  see  that,  using  all  four  variables 
simultaneously  the  first  minimum  AIC  occurs  at  the  alternative  submodel  1, 
that  Is,  when  (S)  (Ve)  (VI)  are  all  clustered  separately.  This  Indicates 
again  that  Indeed  there  are  three  types  of  species.  Therefore,  the  MAICE  Is 
submodel  1.  Not  surprisingly,  the  second  minimum  AIC  occurs  at  the 
alternative  submodel  4  telling  us  that  If  we  were  to  cluster  any  one  of  the 
two  Iris  groups,  we  should  cluster  I.  verlscolor  and  I.  vlrqlnlca  together  as 
one  homogeneous  group,  and  we  should  cluster  I.  setosa  completely  separate  as 
one  heterogeneous  group. 

Here,  It  Is  Important  to  note  that  we  obtained  also  the  same  results  when 
we  used  the  four  variables  separately  In  our  computation  of  AIC  In  the 
previous  section,  which  Is  encouraging. 

Thus,  In  concluding,  we  see  from  these  numerical  results  that  AIC  and 
consequently  minimum  AIC  procedures  are  very  successful  Indeed  In  Identifying 
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the  best  clustering  alternatives  when  we  cluster  samples  Into  homogeneous  sets 
both  In  the  univariate  and  the  multivariate  cases. 

Moreover,  the  definition  of  MAICE  gives  a  clear  formulation  of  the 
principle  of  parsimony  In  statistical  model  building  or  comparison  as  the 
above  examples  demonstrate.  And  MAICE  provides  a  versatile  procedure  for 
statistical  model  Identification  which  Is  free  from  the  ambiguities  Inherent 
In  the  application  of  conventional  statistical  procedures. 


I 
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